-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SROA][TTI][DirectX] Add support for struct alloca decomposition #161601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Deric C. (Icohedron) ChangesFixes #147109 and #160773 This PR implements "struct alloca decomposition" into SROA to be used by backends (namely, DirectX / DXIL) which do not allow alloca instructions with struct types. Inclusion of struct alloca decomposition to SROAStruct alloca decomposition eliminates struct-based alloca instructions by replacing them with separate alloca instructions for each of the struct members. It can be thought of as an Array-of-Structs to Struct-of-Arrays transformation without the creation of a new struct to hold all the arrays. This implementation of struct alloca decomposition currently only supports the decomposition of struct-based alloca instructions used by GEPs, memset, memcpy/memmove, and lifetime intrinsics. SROA pass options changesWith this change, the SROA pass options has become a struct to enable toggling struct alloca decomposition alongside the existing preserve-cfg option. TargetTransformInfo changesTo enable struct alloca decomposition when compiling for the DirectX / DXIL backend, an additional TargetTransformInfo entry is added called NotesIt should be noted that struct alloca decomposition is unsafe in languages that allow dynamic indexing across struct members (e.g., treating a pointer to one member as a base for accessing others via computed offsets). Such behavior can violate the assumptions of this decomposition, which expects each member to be accessed independently and explicitly. Patch is 54.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161601.diff 11 Files Affected:
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 7a4abe9ee5082..599ad3afd008c 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1977,6 +1977,11 @@ class TargetTransformInfo {
/// target.
LLVM_ABI bool allowVectorElementIndexingUsingGEP() const;
+ /// \returns True if the target does not support struct allocas and therefore
+ /// requires struct alloca instructions to be scalarized / decomposed into
+ /// its components.
+ LLVM_ABI bool shouldDecomposeStructAllocas() const;
+
private:
std::unique_ptr<const TargetTransformInfoImplBase> TTIImpl;
};
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 566e1cf51631a..6b8fc753580ac 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -1163,6 +1163,8 @@ class TargetTransformInfoImplBase {
virtual bool allowVectorElementIndexingUsingGEP() const { return true; }
+ virtual bool shouldDecomposeStructAllocas() const { return false; }
+
protected:
// Obtain the minimum required size to hold the value (without the sign)
// In case of a vector it returns the min required size for one element.
diff --git a/llvm/include/llvm/Transforms/Scalar.h b/llvm/include/llvm/Transforms/Scalar.h
index 8e68b6a57e51f..20c05687e1b4c 100644
--- a/llvm/include/llvm/Transforms/Scalar.h
+++ b/llvm/include/llvm/Transforms/Scalar.h
@@ -44,7 +44,8 @@ LLVM_ABI FunctionPass *createDeadStoreEliminationPass();
//
// SROA - Replace aggregates or pieces of aggregates with scalar SSA values.
//
-LLVM_ABI FunctionPass *createSROAPass(bool PreserveCFG = true);
+LLVM_ABI FunctionPass *createSROAPass(bool PreserveCFG = true,
+ bool DecomposeStructs = false);
//===----------------------------------------------------------------------===//
//
diff --git a/llvm/include/llvm/Transforms/Scalar/SROA.h b/llvm/include/llvm/Transforms/Scalar/SROA.h
index 8bb65bf7225e0..1de37b749f847 100644
--- a/llvm/include/llvm/Transforms/Scalar/SROA.h
+++ b/llvm/include/llvm/Transforms/Scalar/SROA.h
@@ -21,15 +21,31 @@ namespace llvm {
class Function;
-enum class SROAOptions : bool { ModifyCFG, PreserveCFG };
+struct SROAOptions {
+ enum PreserveCFGOption : bool { ModifyCFG, PreserveCFG };
+ enum DecomposeStructsOption : bool { NoDecomposeStructs, DecomposeStructs };
+ PreserveCFGOption PCFGOption;
+ DecomposeStructsOption DSOption;
+ SROAOptions(PreserveCFGOption PCFGOption)
+ : PCFGOption(PCFGOption), DSOption(NoDecomposeStructs) {}
+ SROAOptions(PreserveCFGOption PCFGOption, DecomposeStructsOption DSOption)
+ : PCFGOption(PCFGOption), DSOption(DSOption) {}
+};
class SROAPass : public PassInfoMixin<SROAPass> {
- const SROAOptions PreserveCFG;
+ const SROAOptions Options;
public:
/// If \p PreserveCFG is set, then the pass is not allowed to modify CFG
/// in any way, even if it would update CFG analyses.
- SROAPass(SROAOptions PreserveCFG);
+ SROAPass(SROAOptions::PreserveCFGOption PreserveCFG);
+
+ /// If \p Options.PreserveCFG is set, then the pass is not allowed to modify
+ /// CFG in any way, even if it would update CFG analyses.
+ /// If \p Options.DecomposeStructs is set, then the pass will decompose
+ /// structs allocas into its constituent components regardless of whether or
+ /// not pointer offsets into them are known at compile time.
+ SROAPass(const SROAOptions &Options);
/// Run the pass over the function.
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index bf62623099a97..dee1dd7b2a710 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1506,6 +1506,10 @@ bool TargetTransformInfo::allowVectorElementIndexingUsingGEP() const {
return TTIImpl->allowVectorElementIndexingUsingGEP();
}
+bool TargetTransformInfo::shouldDecomposeStructAllocas() const {
+ return TTIImpl->shouldDecomposeStructAllocas();
+}
+
TargetTransformInfoImplBase::~TargetTransformInfoImplBase() = default;
TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index c234623caecf9..4f918a33f4dc3 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1353,16 +1353,29 @@ Expected<ScalarizerPassOptions> parseScalarizerOptions(StringRef Params) {
}
Expected<SROAOptions> parseSROAOptions(StringRef Params) {
- if (Params.empty() || Params == "modify-cfg")
- return SROAOptions::ModifyCFG;
- if (Params == "preserve-cfg")
- return SROAOptions::PreserveCFG;
- return make_error<StringError>(
- formatv("invalid SROA pass parameter '{}' (either preserve-cfg or "
- "modify-cfg can be specified)",
- Params)
- .str(),
- inconvertibleErrorCode());
+ SROAOptions::PreserveCFGOption PreserveCFG = SROAOptions::ModifyCFG;
+ SROAOptions::DecomposeStructsOption DecomposeStructs =
+ SROAOptions::NoDecomposeStructs;
+
+ while (!Params.empty()) {
+ StringRef ParamName;
+ std::tie(ParamName, Params) = Params.split(';');
+
+ if (ParamName.consume_front("preserve-cfg"))
+ PreserveCFG = SROAOptions::PreserveCFG;
+ else if (ParamName.consume_front("modify-cfg"))
+ PreserveCFG = SROAOptions::ModifyCFG;
+ else if (ParamName.consume_front("no-decompose-structs"))
+ DecomposeStructs = SROAOptions::NoDecomposeStructs;
+ else if (ParamName.consume_front("decompose-structs"))
+ DecomposeStructs = SROAOptions::DecomposeStructs;
+ else
+ return make_error<StringError>(
+ formatv("invalid SROA pass option '{}'", ParamName).str(),
+ inconvertibleErrorCode());
+ }
+
+ return SROAOptions(PreserveCFG, DecomposeStructs);
}
Expected<StackLifetime::LivenessType>
diff --git a/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp b/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
index bcf84403b2c0d..29df12b24850e 100644
--- a/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
+++ b/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
@@ -48,6 +48,7 @@
#include "llvm/Target/TargetLoweringObjectFile.h"
#include "llvm/Transforms/IPO/GlobalDCE.h"
#include "llvm/Transforms/Scalar.h"
+#include "llvm/Transforms/Scalar/SROA.h"
#include "llvm/Transforms/Scalar/Scalarizer.h"
#include <optional>
@@ -107,6 +108,10 @@ class DirectXPassConfig : public TargetPassConfig {
FunctionPass *createTargetRegisterAllocator(bool) override { return nullptr; }
void addCodeGenPrepare() override {
+ // Clang does not apply SROA with -O0, but it is required for DXIL. So we
+ // add SROA here when -O0 is given.
+ if (getOptLevel() == CodeGenOptLevel::None)
+ addPass(createSROAPass(/*PreserveCFG=*/true, /*DecomposeStructs=*/true));
addPass(createDXILFinalizeLinkageLegacyPass());
addPass(createGlobalDCEPass());
addPass(createDXILResourceAccessLegacyPass());
diff --git a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp
index 68fd3e0bc74c7..8193b5c40acc4 100644
--- a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp
+++ b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp
@@ -65,3 +65,5 @@ bool DirectXTTIImpl::isTargetIntrinsicTriviallyScalarizable(
return false;
}
}
+
+bool DirectXTTIImpl::shouldDecomposeStructAllocas() const { return true; }
diff --git a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h
index e2dd4354a8167..5a15d0a4f8510 100644
--- a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h
+++ b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h
@@ -39,6 +39,7 @@ class DirectXTTIImpl final : public BasicTTIImplBase<DirectXTTIImpl> {
unsigned ScalarOpdIdx) const override;
bool isTargetIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,
int OpdIdx) const override;
+ bool shouldDecomposeStructAllocas() const override;
};
} // namespace llvm
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index 45d3d493a9e68..f62bbe23b0827 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -43,6 +43,7 @@
#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/PtrUseVisitor.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Config/llvm-config.h"
#include "llvm/IR/BasicBlock.h"
@@ -56,6 +57,7 @@
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"
+#include "llvm/IR/GEPNoWrapFlags.h"
#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"
@@ -172,8 +174,10 @@ using RewriteableMemOps = SmallVector<RewriteableMemOp, 2>;
class SROA {
LLVMContext *const C;
DomTreeUpdater *const DTU;
+ TargetTransformInfo *const TTI;
AssumptionCache *const AC;
const bool PreserveCFG;
+ const bool DecomposeStructs;
/// Worklist of alloca instructions to simplify.
///
@@ -235,10 +239,11 @@ class SROA {
isSafeSelectToSpeculate(SelectInst &SI, bool PreserveCFG);
public:
- SROA(LLVMContext *C, DomTreeUpdater *DTU, AssumptionCache *AC,
- SROAOptions PreserveCFG_)
- : C(C), DTU(DTU), AC(AC),
- PreserveCFG(PreserveCFG_ == SROAOptions::PreserveCFG) {}
+ SROA(LLVMContext *C, DomTreeUpdater *DTU, TargetTransformInfo *TTI,
+ AssumptionCache *AC, const SROAOptions &Options)
+ : C(C), DTU(DTU), TTI(TTI), AC(AC),
+ PreserveCFG(Options.PCFGOption == SROAOptions::PreserveCFG),
+ DecomposeStructs(Options.DSOption == SROAOptions::DecomposeStructs) {}
/// Main run method used by both the SROAPass and by the legacy pass.
std::pair<bool /*Changed*/, bool /*CFGChanged*/> runSROA(Function &F);
@@ -246,6 +251,7 @@ class SROA {
private:
friend class AllocaSliceRewriter;
+ bool decomposeStructAlloca(AllocaInst &AI);
bool presplitLoadsAndStores(AllocaInst &AI, AllocaSlices &AS);
AllocaInst *rewritePartition(AllocaInst &AI, AllocaSlices &AS, Partition &P);
bool splitAlloca(AllocaInst &AI, AllocaSlices &AS);
@@ -4511,6 +4517,299 @@ class AggLoadStoreRewriter : public InstVisitor<AggLoadStoreRewriter, bool> {
} // end anonymous namespace
+/// Returns the pointee type of a given pointer value.
+///
+/// This function inspects the provided `Value *Ptr`, which must be a pointer
+/// type, and attempts to determine the type of the object it points to. It
+/// handles several common LLVM IR constructs:
+///
+/// - `AllocaInst`: Returns the allocated type.
+/// - `GlobalValue`: Returns the value type of the global.
+/// - `GetElementPtrInst`: Returns the result element type.
+/// - `Argument`: If marked with `byval` or `byref`, returns the corresponding
+/// parameter type.
+///
+/// \param Ptr a pointer-typed Value.
+/// \returns the pointee `Type *` if it can be determined, or `nullptr`
+/// otherwise.
+static Type *getPointeeType(Value *Ptr) {
+ assert(Ptr->getType()->isPointerTy());
+ Type *Ty = nullptr;
+ if (AllocaInst *Alloca = dyn_cast<AllocaInst>(Ptr))
+ Ty = Alloca->getAllocatedType();
+ else if (GlobalValue *GV = dyn_cast<GlobalValue>(Ptr))
+ Ty = GV->getValueType();
+ else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Ptr))
+ Ty = GEP->getResultElementType();
+ else if (Argument *Arg = dyn_cast<Argument>(Ptr)) {
+ if (Arg->hasByValAttr())
+ Ty = Arg->getParamByValType();
+ else if (Arg->hasByRefAttr())
+ Ty = Arg->getParamByRefType();
+ }
+ return Ty;
+}
+
+namespace {
+
+/// A visitor that determines whether or not a struct-based alloca can be
+/// decomposed into separate allocas for each of its individual members.
+///
+/// The analysis walks through the uses of the alloca, validating each
+/// instruction to ensure it conforms to expected patterns (e.g., constant GEP
+/// indices, correct struct types). If any unsupported or ambiguous access is
+/// encountered, the visitor is aborted.
+///
+/// This visitor provides iteration support over valid accesses to be replaced,
+/// tracks dead users after struct alloca decomposition, and exposes the
+/// first instruction that caused an abort if the visit determines that the
+/// struct alloca can not be decomposed.
+class StructDecompositionAnalysis
+ : public InstVisitor<StructDecompositionAnalysis> {
+public:
+ StructDecompositionAnalysis(AllocaInst &AI) {
+ this->AI = &AI;
+
+ // Ensure the allocated type is a struct or (multi-dimensional) array of
+ // structs.
+ Type *Ty = AI.getAllocatedType();
+ while (isa<ArrayType>(Ty))
+ Ty = Ty->getArrayElementType();
+ StructTy = dyn_cast<StructType>(Ty);
+ if (!StructTy) {
+ AbortedInfo = &AI;
+ return;
+ }
+ const DataLayout &DL = AI.getDataLayout();
+ assert(DL.getTypeAllocSize(StructTy).isFixed() &&
+ "The struct must have a fixed size!");
+ StructSizeInBytes = DL.getTypeAllocSize(StructTy).getFixedValue();
+
+ enqueueUses(AI);
+
+ // Visit all the uses off the worklist until it is empty or we abort.
+ while (!Worklist.empty() && !isAborted()) {
+ Use *U = Worklist.pop_back_val();
+ Instruction *User = cast<Instruction>(U->getUser());
+ visit(User);
+ }
+ }
+
+ /// Support for iterating over the accesses to the struct alloca.
+ /// @{
+ using iterator = SmallVector<Instruction *>::iterator;
+ using range = iterator_range<iterator>;
+
+ iterator begin() { return Accesses.begin(); }
+ iterator end() { return Accesses.end(); }
+
+ using const_iterator = SmallVector<Instruction *>::const_iterator;
+ using const_range = iterator_range<const_iterator>;
+
+ const_iterator begin() const { return Accesses.begin(); }
+ const_iterator end() const { return Accesses.end(); }
+ /// @}
+
+ /// If there are instructions that are not handled by the struct decomposer,
+ /// then abort decomposing the struct.
+ bool isAborted() { return AbortedInfo != nullptr; }
+
+ /// Get the instruction causing the visit to abort.
+ /// \returns a pointer to the instruction causing the abort if one is
+ /// available; otherwise returns null.
+ Instruction *getAbortingInst() const { return AbortedInfo; }
+
+ /// Access the dead users for this alloca after struct decomposition.
+ ArrayRef<Instruction *> getDeadUsers() const { return DeadUsers; }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ void print(raw_ostream &OS, const_iterator I, StringRef Indent = " ") const;
+ void print(raw_ostream &OS) const;
+ void dump(const_iterator I) const;
+ void dump() const;
+#endif
+
+private:
+ friend InstVisitor<StructDecompositionAnalysis>;
+
+ SmallVector<Instruction *> Accesses;
+
+ /// The AllocaInst being visited, its size, and its corresponding StructType.
+ AllocaInst *AI;
+ uint64_t StructSizeInBytes;
+ StructType *StructTy;
+
+ /// The worklist of to-visit uses.
+ SmallVector<Use *, 8> Worklist;
+
+ /// If the struct is invalid to be decomposed, this analysis will be aborted.
+ Instruction *AbortedInfo = nullptr;
+
+ /// Users of the Alloca which will be considered dead if the Alloca is
+ /// decomposed
+ SmallVector<Instruction *, 8> DeadUsers;
+
+ /// A set of visited uses to break cycles in unreachable code.
+ SmallPtrSet<Use *, 8> VisitedUses;
+
+ /// Set to de-duplicate dead instructions found in the use walk.
+ SmallPtrSet<Instruction *, 4> VisitedDeadInsts;
+
+ void enqueueUses(Value &I) {
+ for (Use &U : I.uses())
+ if (VisitedUses.insert(&U).second)
+ Worklist.push_back(&U);
+ }
+
+ void visitGetElementPtrInst(GetElementPtrInst &GEPI) {
+ // The GEPs visited must have a source element type of the struct or a
+ // (multi-dimensional) array of structs. Otherwise the intended access chain
+ // for the struct can be ambiguous.
+ unsigned StructMemberOperandIdx = 2;
+ Type *Ty = GEPI.getSourceElementType();
+ while (Ty->isArrayTy()) {
+ Ty = Ty->getArrayElementType();
+ StructMemberOperandIdx++;
+ }
+ if (Ty != StructTy) {
+ AbortedInfo = &GEPI;
+ return;
+ }
+
+ // If this GEP does not have the struct member index, then visit its uses.
+ if (GEPI.getNumOperands() < StructMemberOperandIdx + 1) {
+ markAsDead(GEPI);
+ enqueueUses(GEPI);
+ return;
+ }
+
+ // Ensure the struct member index is constant.
+ Value *StructMemberIdx = GEPI.getOperand(StructMemberOperandIdx);
+ if (!isa<ConstantInt>(StructMemberIdx)) {
+ AbortedInfo = &GEPI;
+ return;
+ }
+
+ Accesses.push_back(&GEPI);
+ }
+
+ void visitMemSetInst(MemSetInst &MSI) {
+ // Ensure the number of bytes set is a multiple of the struct size in
+ // bytes.
+ if (!MSI.getLengthInBytes()) {
+ AbortedInfo = &MSI;
+ return;
+ }
+ APInt Length = *MSI.getLengthInBytes();
+ if (Length.getZExtValue() % StructSizeInBytes != 0) {
+ AbortedInfo = &MSI;
+ return;
+ }
+
+ // Ensure we are setting the bytes of the correct type of struct.
+ Value *Dest = MSI.getDest();
+ if (AllocaInst *Alloca = dyn_cast<AllocaInst>(Dest))
+ assert(Alloca == AI &&
+ "It should be impossible to visit the allocas of other structs!");
+ else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Dest)) {
+ [[maybe_unused]] Type *Ty = GEP->getResultElementType();
+ while (Ty->isArrayTy())
+ Ty = Ty->getArrayElementType();
+ assert(Ty == StructTy &&
+ "GEP must have a result element type of the expected struct or a "
+ "(multi-dimensional) array of it!");
+ } else {
+ AbortedInfo = &MSI;
+ return;
+ }
+ Accesses.push_back(&MSI);
+ }
+
+ void visitMemTransferInst(MemTransferInst &MTI) {
+ // Ensure the number of bytes transferred is a multiple of the struct size
+ // in bytes.
+ if (!MTI.getLengthInBytes()) {
+ AbortedInfo = &MTI;
+ return;
+ }
+ APInt Length = *MTI.getLengthInBytes();
+ if (Length.getZExtValue() % StructSizeInBytes != 0) {
+ AbortedInfo = &MTI;
+ return;
+ }
+
+ // Ensure we are transferring the bytes of the correct type of struct.
+ auto IsStructTy = [&](Type *Ty) -> bool {
+ while (Ty->isArrayTy())
+ Ty = Ty->getArrayElementType();
+ return Ty == StructTy;
+ };
+
+ Value *Dest = MTI.getRawDest();
+ Type *DestPtrTy = getPointeeType(Dest);
+ Value *Src = MTI.getRawSource();
+ Type *SrcPtrTy = getPointeeType(Src);
+ if (!DestPtrTy || !SrcPtrTy || DestPtrTy != SrcPtrTy ||
+ !IsStructTy(DestPtrTy)) {
+ AbortedInfo = &MTI;
+ return;
+ }
+
+ Accesses.push_back(&MTI);
+ }
+
+ void visitInstruction(Instruction &I) { AbortedInfo = &I; }
+
+ void visitIntrinsicInst(IntrinsicInst &II) {
+ switch (II.getIntrinsicID()) {
+ case Intrinsic::lifetime_start:
+ case Intrinsic::lifetime_end:
+ Accesses.push_back(&II);
+ break;
+ default:
+ AbortedInfo = &II;
+ }
+ }
+
+ void markAsDead(Instruction &I) {
+ if (VisitedDeadInsts.insert(&I).second)
+ DeadUsers.push_back(&I);
+ }
+};
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+
+void StructDecompositionAnalysis::print(raw_ostream &OS, const_iterator I,
+ StringRef Indent) const {
+ OS << Indent << **I << "\n";
+}
+
+void StructDecompositionAnalysis::print(raw_ostream &OS) const {
+ if (AbortedInfo) {
+ OS << "Can't decompose struct alloca: " << *AI << "\n"
+ << " An access to this alloca is not supported:\n"
+ << " " << *AbortedInfo << "\n";
+ return;
+ }
+
+ OS << "Instructions to rewrite for this alloca: " << *AI << "\n";
+ for (const_iterator I = begin(), E = end(); I != E; ++I)
+ print(OS, I);
+}
+
+LLVM_DUMP_METHOD void
+StructDecompositionAnalysis::dump(const_iterator I) const {
+ print(dbgs(), I);
+}
+
+LLVM_DUMP_METHOD void StructDecompositionAnalysis::dump() const {
+ print(dbgs());
+}
+
+#endif // !de...
[truncated]
|
@llvm/pr-subscribers-backend-directx Author: Deric C. (Icohedron) ChangesFixes #147109 and #160773 This PR implements "struct alloca decomposition" into SROA to be used by backends (namely, DirectX / DXIL) which do not allow alloca instructions with struct types. Inclusion of struct alloca decomposition to SROAStruct alloca decomposition eliminates struct-based alloca instructions by replacing them with separate alloca instructions for each of the struct members. It can be thought of as an Array-of-Structs to Struct-of-Arrays transformation without the creation of a new struct to hold all the arrays. This implementation of struct alloca decomposition currently only supports the decomposition of struct-based alloca instructions used by GEPs, memset, memcpy/memmove, and lifetime intrinsics. SROA pass options changesWith this change, the SROA pass options has become a struct to enable toggling struct alloca decomposition alongside the existing preserve-cfg option. TargetTransformInfo changesTo enable struct alloca decomposition when compiling for the DirectX / DXIL backend, an additional TargetTransformInfo entry is added called NotesIt should be noted that struct alloca decomposition is unsafe in languages that allow dynamic indexing across struct members (e.g., treating a pointer to one member as a base for accessing others via computed offsets). Such behavior can violate the assumptions of this decomposition, which expects each member to be accessed independently and explicitly. Patch is 54.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/161601.diff 11 Files Affected:
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 7a4abe9ee5082..599ad3afd008c 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1977,6 +1977,11 @@ class TargetTransformInfo {
/// target.
LLVM_ABI bool allowVectorElementIndexingUsingGEP() const;
+ /// \returns True if the target does not support struct allocas and therefore
+ /// requires struct alloca instructions to be scalarized / decomposed into
+ /// its components.
+ LLVM_ABI bool shouldDecomposeStructAllocas() const;
+
private:
std::unique_ptr<const TargetTransformInfoImplBase> TTIImpl;
};
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 566e1cf51631a..6b8fc753580ac 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -1163,6 +1163,8 @@ class TargetTransformInfoImplBase {
virtual bool allowVectorElementIndexingUsingGEP() const { return true; }
+ virtual bool shouldDecomposeStructAllocas() const { return false; }
+
protected:
// Obtain the minimum required size to hold the value (without the sign)
// In case of a vector it returns the min required size for one element.
diff --git a/llvm/include/llvm/Transforms/Scalar.h b/llvm/include/llvm/Transforms/Scalar.h
index 8e68b6a57e51f..20c05687e1b4c 100644
--- a/llvm/include/llvm/Transforms/Scalar.h
+++ b/llvm/include/llvm/Transforms/Scalar.h
@@ -44,7 +44,8 @@ LLVM_ABI FunctionPass *createDeadStoreEliminationPass();
//
// SROA - Replace aggregates or pieces of aggregates with scalar SSA values.
//
-LLVM_ABI FunctionPass *createSROAPass(bool PreserveCFG = true);
+LLVM_ABI FunctionPass *createSROAPass(bool PreserveCFG = true,
+ bool DecomposeStructs = false);
//===----------------------------------------------------------------------===//
//
diff --git a/llvm/include/llvm/Transforms/Scalar/SROA.h b/llvm/include/llvm/Transforms/Scalar/SROA.h
index 8bb65bf7225e0..1de37b749f847 100644
--- a/llvm/include/llvm/Transforms/Scalar/SROA.h
+++ b/llvm/include/llvm/Transforms/Scalar/SROA.h
@@ -21,15 +21,31 @@ namespace llvm {
class Function;
-enum class SROAOptions : bool { ModifyCFG, PreserveCFG };
+struct SROAOptions {
+ enum PreserveCFGOption : bool { ModifyCFG, PreserveCFG };
+ enum DecomposeStructsOption : bool { NoDecomposeStructs, DecomposeStructs };
+ PreserveCFGOption PCFGOption;
+ DecomposeStructsOption DSOption;
+ SROAOptions(PreserveCFGOption PCFGOption)
+ : PCFGOption(PCFGOption), DSOption(NoDecomposeStructs) {}
+ SROAOptions(PreserveCFGOption PCFGOption, DecomposeStructsOption DSOption)
+ : PCFGOption(PCFGOption), DSOption(DSOption) {}
+};
class SROAPass : public PassInfoMixin<SROAPass> {
- const SROAOptions PreserveCFG;
+ const SROAOptions Options;
public:
/// If \p PreserveCFG is set, then the pass is not allowed to modify CFG
/// in any way, even if it would update CFG analyses.
- SROAPass(SROAOptions PreserveCFG);
+ SROAPass(SROAOptions::PreserveCFGOption PreserveCFG);
+
+ /// If \p Options.PreserveCFG is set, then the pass is not allowed to modify
+ /// CFG in any way, even if it would update CFG analyses.
+ /// If \p Options.DecomposeStructs is set, then the pass will decompose
+ /// structs allocas into its constituent components regardless of whether or
+ /// not pointer offsets into them are known at compile time.
+ SROAPass(const SROAOptions &Options);
/// Run the pass over the function.
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index bf62623099a97..dee1dd7b2a710 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1506,6 +1506,10 @@ bool TargetTransformInfo::allowVectorElementIndexingUsingGEP() const {
return TTIImpl->allowVectorElementIndexingUsingGEP();
}
+bool TargetTransformInfo::shouldDecomposeStructAllocas() const {
+ return TTIImpl->shouldDecomposeStructAllocas();
+}
+
TargetTransformInfoImplBase::~TargetTransformInfoImplBase() = default;
TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index c234623caecf9..4f918a33f4dc3 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1353,16 +1353,29 @@ Expected<ScalarizerPassOptions> parseScalarizerOptions(StringRef Params) {
}
Expected<SROAOptions> parseSROAOptions(StringRef Params) {
- if (Params.empty() || Params == "modify-cfg")
- return SROAOptions::ModifyCFG;
- if (Params == "preserve-cfg")
- return SROAOptions::PreserveCFG;
- return make_error<StringError>(
- formatv("invalid SROA pass parameter '{}' (either preserve-cfg or "
- "modify-cfg can be specified)",
- Params)
- .str(),
- inconvertibleErrorCode());
+ SROAOptions::PreserveCFGOption PreserveCFG = SROAOptions::ModifyCFG;
+ SROAOptions::DecomposeStructsOption DecomposeStructs =
+ SROAOptions::NoDecomposeStructs;
+
+ while (!Params.empty()) {
+ StringRef ParamName;
+ std::tie(ParamName, Params) = Params.split(';');
+
+ if (ParamName.consume_front("preserve-cfg"))
+ PreserveCFG = SROAOptions::PreserveCFG;
+ else if (ParamName.consume_front("modify-cfg"))
+ PreserveCFG = SROAOptions::ModifyCFG;
+ else if (ParamName.consume_front("no-decompose-structs"))
+ DecomposeStructs = SROAOptions::NoDecomposeStructs;
+ else if (ParamName.consume_front("decompose-structs"))
+ DecomposeStructs = SROAOptions::DecomposeStructs;
+ else
+ return make_error<StringError>(
+ formatv("invalid SROA pass option '{}'", ParamName).str(),
+ inconvertibleErrorCode());
+ }
+
+ return SROAOptions(PreserveCFG, DecomposeStructs);
}
Expected<StackLifetime::LivenessType>
diff --git a/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp b/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
index bcf84403b2c0d..29df12b24850e 100644
--- a/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
+++ b/llvm/lib/Target/DirectX/DirectXTargetMachine.cpp
@@ -48,6 +48,7 @@
#include "llvm/Target/TargetLoweringObjectFile.h"
#include "llvm/Transforms/IPO/GlobalDCE.h"
#include "llvm/Transforms/Scalar.h"
+#include "llvm/Transforms/Scalar/SROA.h"
#include "llvm/Transforms/Scalar/Scalarizer.h"
#include <optional>
@@ -107,6 +108,10 @@ class DirectXPassConfig : public TargetPassConfig {
FunctionPass *createTargetRegisterAllocator(bool) override { return nullptr; }
void addCodeGenPrepare() override {
+ // Clang does not apply SROA with -O0, but it is required for DXIL. So we
+ // add SROA here when -O0 is given.
+ if (getOptLevel() == CodeGenOptLevel::None)
+ addPass(createSROAPass(/*PreserveCFG=*/true, /*DecomposeStructs=*/true));
addPass(createDXILFinalizeLinkageLegacyPass());
addPass(createGlobalDCEPass());
addPass(createDXILResourceAccessLegacyPass());
diff --git a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp
index 68fd3e0bc74c7..8193b5c40acc4 100644
--- a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp
+++ b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.cpp
@@ -65,3 +65,5 @@ bool DirectXTTIImpl::isTargetIntrinsicTriviallyScalarizable(
return false;
}
}
+
+bool DirectXTTIImpl::shouldDecomposeStructAllocas() const { return true; }
diff --git a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h
index e2dd4354a8167..5a15d0a4f8510 100644
--- a/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h
+++ b/llvm/lib/Target/DirectX/DirectXTargetTransformInfo.h
@@ -39,6 +39,7 @@ class DirectXTTIImpl final : public BasicTTIImplBase<DirectXTTIImpl> {
unsigned ScalarOpdIdx) const override;
bool isTargetIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,
int OpdIdx) const override;
+ bool shouldDecomposeStructAllocas() const override;
};
} // namespace llvm
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index 45d3d493a9e68..f62bbe23b0827 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -43,6 +43,7 @@
#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/PtrUseVisitor.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Config/llvm-config.h"
#include "llvm/IR/BasicBlock.h"
@@ -56,6 +57,7 @@
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"
+#include "llvm/IR/GEPNoWrapFlags.h"
#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"
@@ -172,8 +174,10 @@ using RewriteableMemOps = SmallVector<RewriteableMemOp, 2>;
class SROA {
LLVMContext *const C;
DomTreeUpdater *const DTU;
+ TargetTransformInfo *const TTI;
AssumptionCache *const AC;
const bool PreserveCFG;
+ const bool DecomposeStructs;
/// Worklist of alloca instructions to simplify.
///
@@ -235,10 +239,11 @@ class SROA {
isSafeSelectToSpeculate(SelectInst &SI, bool PreserveCFG);
public:
- SROA(LLVMContext *C, DomTreeUpdater *DTU, AssumptionCache *AC,
- SROAOptions PreserveCFG_)
- : C(C), DTU(DTU), AC(AC),
- PreserveCFG(PreserveCFG_ == SROAOptions::PreserveCFG) {}
+ SROA(LLVMContext *C, DomTreeUpdater *DTU, TargetTransformInfo *TTI,
+ AssumptionCache *AC, const SROAOptions &Options)
+ : C(C), DTU(DTU), TTI(TTI), AC(AC),
+ PreserveCFG(Options.PCFGOption == SROAOptions::PreserveCFG),
+ DecomposeStructs(Options.DSOption == SROAOptions::DecomposeStructs) {}
/// Main run method used by both the SROAPass and by the legacy pass.
std::pair<bool /*Changed*/, bool /*CFGChanged*/> runSROA(Function &F);
@@ -246,6 +251,7 @@ class SROA {
private:
friend class AllocaSliceRewriter;
+ bool decomposeStructAlloca(AllocaInst &AI);
bool presplitLoadsAndStores(AllocaInst &AI, AllocaSlices &AS);
AllocaInst *rewritePartition(AllocaInst &AI, AllocaSlices &AS, Partition &P);
bool splitAlloca(AllocaInst &AI, AllocaSlices &AS);
@@ -4511,6 +4517,299 @@ class AggLoadStoreRewriter : public InstVisitor<AggLoadStoreRewriter, bool> {
} // end anonymous namespace
+/// Returns the pointee type of a given pointer value.
+///
+/// This function inspects the provided `Value *Ptr`, which must be a pointer
+/// type, and attempts to determine the type of the object it points to. It
+/// handles several common LLVM IR constructs:
+///
+/// - `AllocaInst`: Returns the allocated type.
+/// - `GlobalValue`: Returns the value type of the global.
+/// - `GetElementPtrInst`: Returns the result element type.
+/// - `Argument`: If marked with `byval` or `byref`, returns the corresponding
+/// parameter type.
+///
+/// \param Ptr a pointer-typed Value.
+/// \returns the pointee `Type *` if it can be determined, or `nullptr`
+/// otherwise.
+static Type *getPointeeType(Value *Ptr) {
+ assert(Ptr->getType()->isPointerTy());
+ Type *Ty = nullptr;
+ if (AllocaInst *Alloca = dyn_cast<AllocaInst>(Ptr))
+ Ty = Alloca->getAllocatedType();
+ else if (GlobalValue *GV = dyn_cast<GlobalValue>(Ptr))
+ Ty = GV->getValueType();
+ else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Ptr))
+ Ty = GEP->getResultElementType();
+ else if (Argument *Arg = dyn_cast<Argument>(Ptr)) {
+ if (Arg->hasByValAttr())
+ Ty = Arg->getParamByValType();
+ else if (Arg->hasByRefAttr())
+ Ty = Arg->getParamByRefType();
+ }
+ return Ty;
+}
+
+namespace {
+
+/// A visitor that determines whether or not a struct-based alloca can be
+/// decomposed into separate allocas for each of its individual members.
+///
+/// The analysis walks through the uses of the alloca, validating each
+/// instruction to ensure it conforms to expected patterns (e.g., constant GEP
+/// indices, correct struct types). If any unsupported or ambiguous access is
+/// encountered, the visitor is aborted.
+///
+/// This visitor provides iteration support over valid accesses to be replaced,
+/// tracks dead users after struct alloca decomposition, and exposes the
+/// first instruction that caused an abort if the visit determines that the
+/// struct alloca can not be decomposed.
+class StructDecompositionAnalysis
+ : public InstVisitor<StructDecompositionAnalysis> {
+public:
+ StructDecompositionAnalysis(AllocaInst &AI) {
+ this->AI = &AI;
+
+ // Ensure the allocated type is a struct or (multi-dimensional) array of
+ // structs.
+ Type *Ty = AI.getAllocatedType();
+ while (isa<ArrayType>(Ty))
+ Ty = Ty->getArrayElementType();
+ StructTy = dyn_cast<StructType>(Ty);
+ if (!StructTy) {
+ AbortedInfo = &AI;
+ return;
+ }
+ const DataLayout &DL = AI.getDataLayout();
+ assert(DL.getTypeAllocSize(StructTy).isFixed() &&
+ "The struct must have a fixed size!");
+ StructSizeInBytes = DL.getTypeAllocSize(StructTy).getFixedValue();
+
+ enqueueUses(AI);
+
+ // Visit all the uses off the worklist until it is empty or we abort.
+ while (!Worklist.empty() && !isAborted()) {
+ Use *U = Worklist.pop_back_val();
+ Instruction *User = cast<Instruction>(U->getUser());
+ visit(User);
+ }
+ }
+
+ /// Support for iterating over the accesses to the struct alloca.
+ /// @{
+ using iterator = SmallVector<Instruction *>::iterator;
+ using range = iterator_range<iterator>;
+
+ iterator begin() { return Accesses.begin(); }
+ iterator end() { return Accesses.end(); }
+
+ using const_iterator = SmallVector<Instruction *>::const_iterator;
+ using const_range = iterator_range<const_iterator>;
+
+ const_iterator begin() const { return Accesses.begin(); }
+ const_iterator end() const { return Accesses.end(); }
+ /// @}
+
+ /// If there are instructions that are not handled by the struct decomposer,
+ /// then abort decomposing the struct.
+ bool isAborted() { return AbortedInfo != nullptr; }
+
+ /// Get the instruction causing the visit to abort.
+ /// \returns a pointer to the instruction causing the abort if one is
+ /// available; otherwise returns null.
+ Instruction *getAbortingInst() const { return AbortedInfo; }
+
+ /// Access the dead users for this alloca after struct decomposition.
+ ArrayRef<Instruction *> getDeadUsers() const { return DeadUsers; }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ void print(raw_ostream &OS, const_iterator I, StringRef Indent = " ") const;
+ void print(raw_ostream &OS) const;
+ void dump(const_iterator I) const;
+ void dump() const;
+#endif
+
+private:
+ friend InstVisitor<StructDecompositionAnalysis>;
+
+ SmallVector<Instruction *> Accesses;
+
+ /// The AllocaInst being visited, its size, and its corresponding StructType.
+ AllocaInst *AI;
+ uint64_t StructSizeInBytes;
+ StructType *StructTy;
+
+ /// The worklist of to-visit uses.
+ SmallVector<Use *, 8> Worklist;
+
+ /// If the struct is invalid to be decomposed, this analysis will be aborted.
+ Instruction *AbortedInfo = nullptr;
+
+ /// Users of the Alloca which will be considered dead if the Alloca is
+ /// decomposed
+ SmallVector<Instruction *, 8> DeadUsers;
+
+ /// A set of visited uses to break cycles in unreachable code.
+ SmallPtrSet<Use *, 8> VisitedUses;
+
+ /// Set to de-duplicate dead instructions found in the use walk.
+ SmallPtrSet<Instruction *, 4> VisitedDeadInsts;
+
+ void enqueueUses(Value &I) {
+ for (Use &U : I.uses())
+ if (VisitedUses.insert(&U).second)
+ Worklist.push_back(&U);
+ }
+
+ void visitGetElementPtrInst(GetElementPtrInst &GEPI) {
+ // The GEPs visited must have a source element type of the struct or a
+ // (multi-dimensional) array of structs. Otherwise the intended access chain
+ // for the struct can be ambiguous.
+ unsigned StructMemberOperandIdx = 2;
+ Type *Ty = GEPI.getSourceElementType();
+ while (Ty->isArrayTy()) {
+ Ty = Ty->getArrayElementType();
+ StructMemberOperandIdx++;
+ }
+ if (Ty != StructTy) {
+ AbortedInfo = &GEPI;
+ return;
+ }
+
+ // If this GEP does not have the struct member index, then visit its uses.
+ if (GEPI.getNumOperands() < StructMemberOperandIdx + 1) {
+ markAsDead(GEPI);
+ enqueueUses(GEPI);
+ return;
+ }
+
+ // Ensure the struct member index is constant.
+ Value *StructMemberIdx = GEPI.getOperand(StructMemberOperandIdx);
+ if (!isa<ConstantInt>(StructMemberIdx)) {
+ AbortedInfo = &GEPI;
+ return;
+ }
+
+ Accesses.push_back(&GEPI);
+ }
+
+ void visitMemSetInst(MemSetInst &MSI) {
+ // Ensure the number of bytes set is a multiple of the struct size in
+ // bytes.
+ if (!MSI.getLengthInBytes()) {
+ AbortedInfo = &MSI;
+ return;
+ }
+ APInt Length = *MSI.getLengthInBytes();
+ if (Length.getZExtValue() % StructSizeInBytes != 0) {
+ AbortedInfo = &MSI;
+ return;
+ }
+
+ // Ensure we are setting the bytes of the correct type of struct.
+ Value *Dest = MSI.getDest();
+ if (AllocaInst *Alloca = dyn_cast<AllocaInst>(Dest))
+ assert(Alloca == AI &&
+ "It should be impossible to visit the allocas of other structs!");
+ else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Dest)) {
+ [[maybe_unused]] Type *Ty = GEP->getResultElementType();
+ while (Ty->isArrayTy())
+ Ty = Ty->getArrayElementType();
+ assert(Ty == StructTy &&
+ "GEP must have a result element type of the expected struct or a "
+ "(multi-dimensional) array of it!");
+ } else {
+ AbortedInfo = &MSI;
+ return;
+ }
+ Accesses.push_back(&MSI);
+ }
+
+ void visitMemTransferInst(MemTransferInst &MTI) {
+ // Ensure the number of bytes transferred is a multiple of the struct size
+ // in bytes.
+ if (!MTI.getLengthInBytes()) {
+ AbortedInfo = &MTI;
+ return;
+ }
+ APInt Length = *MTI.getLengthInBytes();
+ if (Length.getZExtValue() % StructSizeInBytes != 0) {
+ AbortedInfo = &MTI;
+ return;
+ }
+
+ // Ensure we are transferring the bytes of the correct type of struct.
+ auto IsStructTy = [&](Type *Ty) -> bool {
+ while (Ty->isArrayTy())
+ Ty = Ty->getArrayElementType();
+ return Ty == StructTy;
+ };
+
+ Value *Dest = MTI.getRawDest();
+ Type *DestPtrTy = getPointeeType(Dest);
+ Value *Src = MTI.getRawSource();
+ Type *SrcPtrTy = getPointeeType(Src);
+ if (!DestPtrTy || !SrcPtrTy || DestPtrTy != SrcPtrTy ||
+ !IsStructTy(DestPtrTy)) {
+ AbortedInfo = &MTI;
+ return;
+ }
+
+ Accesses.push_back(&MTI);
+ }
+
+ void visitInstruction(Instruction &I) { AbortedInfo = &I; }
+
+ void visitIntrinsicInst(IntrinsicInst &II) {
+ switch (II.getIntrinsicID()) {
+ case Intrinsic::lifetime_start:
+ case Intrinsic::lifetime_end:
+ Accesses.push_back(&II);
+ break;
+ default:
+ AbortedInfo = &II;
+ }
+ }
+
+ void markAsDead(Instruction &I) {
+ if (VisitedDeadInsts.insert(&I).second)
+ DeadUsers.push_back(&I);
+ }
+};
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+
+void StructDecompositionAnalysis::print(raw_ostream &OS, const_iterator I,
+ StringRef Indent) const {
+ OS << Indent << **I << "\n";
+}
+
+void StructDecompositionAnalysis::print(raw_ostream &OS) const {
+ if (AbortedInfo) {
+ OS << "Can't decompose struct alloca: " << *AI << "\n"
+ << " An access to this alloca is not supported:\n"
+ << " " << *AbortedInfo << "\n";
+ return;
+ }
+
+ OS << "Instructions to rewrite for this alloca: " << *AI << "\n";
+ for (const_iterator I = begin(), E = end(); I != E; ++I)
+ print(OS, I);
+}
+
+LLVM_DUMP_METHOD void
+StructDecompositionAnalysis::dump(const_iterator I) const {
+ print(dbgs(), I);
+}
+
+LLVM_DUMP_METHOD void StructDecompositionAnalysis::dump() const {
+ print(dbgs());
+}
+
+#endif // !de...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be noted that struct alloca decomposition is unsafe in languages that allow dynamic indexing across struct members (e.g., treating a pointer to one member as a base for accessing others via computed offsets). Such behavior can violate the assumptions of this decomposition, which expects each member to be accessed independently and explicitly.
Therefore struct alloca decomposition is disabled by default and must be explicitly enabled via an SROA pass option or TargetTransformInfo via shouldDecomposeStructAllocas.
Notably, these languages also include LLVM IR. Transforms must always operate in terms of LLVM IR semantics, and cannot make use of additional guarantees that are not implied by those semantics.
I believe that to make this kind of transform we first need to implement the proposal at
https://discourse.llvm.org/t/rfc-adding-instructions-to-to-carry-gep-type-traversal-information/88141, including the additional limitation on notional over-indexing I mention in my first reply. Or some other approach that provides the necessary guarantees at the IR level.
Understandable. In that case, I shall close this PR and open a new PR when there are LLVM IR semantics able to provide the guarantees needed for this transformation. |
Fixes #147109 and #160773
This PR implements "struct alloca decomposition" into SROA to be used by backends (namely, DirectX / DXIL) which do not allow alloca instructions with struct types.
Inclusion of struct alloca decomposition to SROA
Struct alloca decomposition eliminates struct-based alloca instructions by replacing them with separate alloca instructions for each of the struct members. It can be thought of as an Array-of-Structs to Struct-of-Arrays transformation without the creation of a new struct to hold all the arrays.
This implementation of struct alloca decomposition currently only supports the decomposition of struct-based alloca instructions used by GEPs, memset, memcpy/memmove, and lifetime intrinsics.
It also relies on the presence of struct-typed GEPs to determine the intended struct member being accessed by each instruction. This information is used to redirect pointers to the correct alloca instruction for the corresponding struct member. Clang creates such GEPs which remain present during the first time the SROA pass is ran in the optimization pipeline.
SROA pass options changes
With this change, the SROA pass options has become a struct to enable toggling struct alloca decomposition alongside the existing preserve-cfg option.
A single-bool SROAPass constructor for the PreserveCFG option is kept to require minimal changes to the rest of the LLVM codebase that instantiates the SROAPass. This single-bool constructor has struct alloca decomposition disabled by default, as the decomposition not to safe to apply for all languages and backends.
The flag
decompose-structs
has also been added as a new optional parameter to the SROA pass in PassBuilder, with the default being to disable struct alloca decomposition if the parameter is not present.TargetTransformInfo changes
To enable struct alloca decomposition when compiling for the DirectX / DXIL backend, an additional TargetTransformInfo entry is added called
shouldDecomposeStructAllocas
which returns true if the target does not support struct alloca instructions and therefore requires them to be decomposed.The SROA pass is modified to depend on the TargetTransformInfo pass and uses the information to allow SROA to run if
shouldDecomposeStructAllocas
is true for the target even ifskipFunction()
is true (such as when the function as the OptimizeNone attribute), thus enabling SROA to run even with-O0
if the target requires struct alloca decomposition.Notes
It should be noted that struct alloca decomposition is unsafe in languages that allow dynamic indexing across struct members (e.g., treating a pointer to one member as a base for accessing others via computed offsets). Such behavior can violate the assumptions of this decomposition, which expects each member to be accessed independently and explicitly.
Therefore struct alloca decomposition is disabled by default and must be explicitly enabled via an SROA pass option or TargetTransformInfo via
shouldDecomposeStructAllocas
.